A Multi-phase Semi-supersense Tagging of Korean Unknown Nouns

نویسندگان

  • Young-Bum Kim
  • Jung-Kuk Lee
  • Yu-Seop Kim
چکیده

Supersense tagging is a problem of finding a corresponding semantic super tag (eg. Phenomenon, Act) based on syntactic information and annotated corpora. However, we employ semantic information rather than syntactic one and annotated corpora, because Korean language has relatively flexible syntactic structure and is lack of annotated corpora. To construct the automatic sense tagging system for Korean language, we use semi-supersenses of first and second level in Sejong’s Noun Semantic Class System. We employ a hybrid approach consisting of three phases: one morphological matching phase and two semantic matching phases. The morphological phase is based on suffix pattern matching which assigns compound word to the class including the suffix word. One of the two semantic matching phases is based on concept similarity on WordNet, and the other is based on the term similarity in term matrix reduced by singular value decomposition (SVD). Above semantic phases are using weighted k-Nearest Neighbor classifier commonly but are also using different similarity metrics. In experiments, 79,103 unknown words are extracted from 225,779 noun words from syntactic tagged corpus, and 98% of the unknown words are addressed by our hybrid method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supersense Tagging of Unknown Nouns Using Semantic Similarity

The limited coverage of lexical-semantic resources is a significant problem for NLP systems which can be alleviated by automatically classifying the unknown words. Supersense tagging assigns unknown nouns one of 26 broad semantic categories used by lexicographers to organise their manual insertion into WORDNET. Ciaramita and Johnson (2003) present a tagger which uses synonym set glosses as anno...

متن کامل

Supersense Tagging of Unknown Nouns in WordNet

We present a new framework for classifying common nouns that extends namedentity classification. We used a fixed set of 26 semantic labels, which we called supersenses. These are the labels used by lexicographers developing WordNet. This framework has a number of practical advantages. We show how information contained in the dictionary can be used as additional training data that improves accur...

متن کامل

UFRGS&LIF at SemEval-2016 Task 10: Rule-Based MWE Identification and Predominant-Supersense Tagging

This paper presents our approach towards the SemEval-2016 Task 10 – Detecting Minimal Semantic Units and their Meanings. Systems are expected to provide a representation of lexical semantics by (1) segmenting tokens into words and multiword units and (2) providing a supersense tag for segments that function as nouns or verbs. Our pipeline rule-based system uses no external resources and was imp...

متن کامل

Description and Results of the SuperSense Tagging Task

SuperSense tagging (SST) is a Natural Language Processing task that consists in annotating each significant entity in a text, like nouns, verbs, adjectives and adverbs, within a general semantic taxonomy defined by the WordNet lexicographer classes (called SuperSenses). SST can be considered as a task half-way between Named-Entity Recognition (NER) and Word Sense Disambiguation (WSD): it is an ...

متن کامل

CS388 Final Project Writeup: Unknown Noun Supersense Acquisition using Search Engine Queries

We describe a novel method for tagging unknown nouns with supersenses using search engine queries to refine an initial precomputed approximation. Supersense is the name given in the literature to highest meaningful level in the WordNet [5] hierarchy—these are abstract concepts such as food, person, or cognition [3]. We provide a novel solution to the problem which uses a model built from a rela...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012